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Abstract 

Wheeler's question 'why the quantum' has two aspects: why is the world quan- 
tum and not classical, and why is it quantum rather than superquantum, i.e., why 
the Tsirelson bound for quantum correlations? I discuss a remarkable answer 
to this question proposed by Pawlowski et al (6), who provide an information- 
theoretic derivation of the Tsirelson bound from a principle they call 'information 
causality.' 



1 Introduction 

In a remarkable information-theoretic derivation of the Tsirelson bound for quantum 
correlations by Pawlowski et al [6|, the authors derive the bound from a principle they 
call 'information causality.' Here I review the original derivation and the information- 
theoretic principle involved, and consider the significance of the result. 

Einstein's special theory of relativity follows from just two principles: the light 
postulate and the principle of relativity. In a seminal paper [7 1, Popescu and Rohrlich 
asked whether quantum mechanics follows from relativistic causality, the principle that 
causal processes or signals cannot propagate outside the light cone, and nonlocality in 
the sense of Bell's theorem (2). They showed that it does not: quantum mechanics is 
only one of a class of theories consistent with these two principles. 

To see this, consider a 'nonlocal box,' a hypothetical device proposed by Popescu 
and Rohrlich, now called a 'Popescu-Rohrlich box' or PR-box. A PR-box has two 
inputs, a <E {0, 1} and b € {0, 1}, and two outputs, A € {0, 1} and B e {0, 1}Q and 
is defined by the following correlations between inputs and outputs: 

A®B = a-b (1) 

where is addition mod 2, i.e., 

(i) same outputs (i.e., 00 or 1 1) if the inputs are 00 or 01 or 10 



'in a simulation of PR-box correlations by classical or quantum correlations, inputs correspond to ob- 
servables measured and outputs to measurement outcomes represented by real numbers, so it might seem 
more appropriate to use A, B for inputs and a, b for outputs. I follow the notation of Pawlowski et al (6) 
here, since this is the result I discuss in detail below. 
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(ii) different outputs (i.e., 01 or 10) if the inputs are 1 1 

together with a 'no signaling' constraint. 

A PR-box is bipartite and nonlocal in the sense that the a-input and ^4-ouput can be 
separated from the 6-input and _B-output by any distance without altering the correla- 
tions. For convenience, we can think of the a-input as controlled by Alice, who mon- 
itors the A-output, and the fe-input as controlled by Bob, who monitors the £>-output. 
If we want the correlations of a PR-box to be consistent with relativistic causality, they 
should satisfy a 'no signaling' constraint: no information should be available in the 
marginal probabilities of Alice's outputs about alternative input choices made by Bob, 
and conversely, i.e., 

2 p(A,B\a,b)=p(A\a), A, a,be {0,1} (2) 
fce{o,i} 

p(A,B\a,b)=p(B\b),B,a,b€ {0,1} (3) 

ae{0,l} 

Note that 'no signaling' is not a relativistic constraint per se-it is simply a constraint on 
the marginal probabilities. But if this constraint is not satisfied, instantaneous (hence 
superluminal) signaling is possible, i.e., 'no signaling' is a necessary condition for 
relativistic causality. 

It follows from (Q]i and 'no signaling' that the correlations are as in Table 1: 



a 

b 
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p(00|00) 


= 1/2 


p(10|00) = 





p(00|10) 


= 1/2 


p(10|10) 


= 




p(01|00) 


= 


p(ll|00) = 


1/2 


p(01|10) 


= 


p(ll|10) 


= 1/2 


1 


p(00|01) 


= 1/2 


p(10]01) = 





p(00|ll) 


= 


p(10|ll) 


= 1/2 




p(01|01) 


= 


p(ll]01) = 


1/2 


p(01|ll) 


= 1/2 


p(lllll) 


= 



Table 1 : PR-box correlations 

The probability p(00|00) is to be read as p(A = 0, B = 0|a = 0, b = 0), and the 
probability p(01|10) is to be read as p(A = 0, B = l\a = 1, b = 0), etc. (I drop the 

commas for ease of reading; the first two slots in p( | ) before the conditional- 

ization sign ' | ' represent the two possible outputs for Alice and Bob, respectively, and 
the second two slots after the conditionalization sign represent the two possible inputs 
for Alice and Bob, respectively.) Note that the sum of the probabilities in each square 
cell of the array in Table 1 is 1, and that the marginal probability of for Alice or for 
Bob is obtained by adding the probabilities in the left column of each cell or the top 
row of each cell, respectively, and the marginal probability of 1 is obtained for Alice or 
for Bob by adding the probabilities in the right column of each cell or the bottom row 
of each cell, respectively. One could define a PR-box as exhibiting the correlations in 
Table 1, which are 'no signaling,' rather than in terms of the condition A © B = a ■ b 
and the 'no signaling' constraint. 



2 



Note that a PR box functions in such a way that if Alice inputs a or a 1 , her output 
is or 1 with probability 1/2, irrespective of Bob's input, and irrespective of whether 
Bob inputs anything at all. Similarly for Bob. The requirement is simply that whenever 
there are in fact two inputs, the inputs and outputs are correlated according to ([T). A 
PR-box can function only once, so to get the statistics for many pairs of inputs one has 
to use many PR-boxes. This avoids the problem of selecting the 'corresponding' input 
pairs for different inputs at various times, which would depend on the reference frame. 
In this respect, a PR-box is like a quantum system: after a system has responded to a 
measurement (produced an output for an input), the system is no longer in the same 
quantum state, and one has to use many systems prepared in the same quantum state to 
exhibit the probabilities associated with a given quantum state. 

What is the optimal probability that Alice and Bob can simulate a PR-box, suppos- 
ing they are allowed certain resources? 

In units where A = ±1, B = ±10 

(00) = p(same output|00) - p(different output|00) (4) 



so: 

p(same output|00) = 1 + ^ (5) 

p(differentoutput|00) = 1 ~ ^ (6) 

and similarly for input pairs 01, 10, 11. 

It follows that the probability of successfully simulating a PR-box is given by: 

p(successful sim) = -(p(same output|00) + p(same output|01) + 

p(same output|10) + ^(different output|ll)) (7) 

= \{1+*) = \{1 + E) (8) 

where K = (00) + (01) + (10) - (11) is the Clauser-Horne-Shimony-Holt (CHSH) 
correlation. 

Bell's locality argument in the Clauser-Horne-Shimony-Holt version [4] shows that 
if Alice and Bob are limited to classical resources, i.e., if they are required to repro- 
duce the correlations on the basis of shared randomness or common causes established 
before they separate (after which no communication is allowed), then \Kc\ < 2, i.e., 
\E\ < h, so the optimal probability of successfully simulating a PR-box is i(l + i) = 

3 

4- 

If Alice and Bob are allowed to base their strategy on shared entangled states pre- 
pared before they separate, then the Tsirelson bound for quantum correlations requires 



2 It is convenient to change units here to relate the probability to the usual expression for the Clauser- 
Horne-Shimony-Holt correlation, where the expectation values are expressed in terms of ± 1 values for A 
and B (the relevant observables). Note that 'same output' or 'different output' mean the same thing whatever 
the units, so the probabilities p(same output|yl_B) andp(different output] AB) take the same values whatever 
the units, but the expectation value (AB) depends on the units for A and B. 
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that \Kq\ < 2^/2, i.e., < so the optimal probability of successful simulation 
limited by quantum resources is i(l + ^) rj .85. 

Clearly, the 'no signaling' constraint (or relativistic causality) does not rule out sim- 
ulating a PR-box with a probability greater than i(l + -3=). As Popescu and Rohrlich 
observe, there are possible worlds described by 'superquantum' theories that allow 
nonlocal boxes with 'no signaling' correlations stronger than quantum correlations, in 
the sense that ^ < E < 1. The correlations of a PR-box saturate the CHSH inequality 
(E = 1), and so represent a limiting case of 'no signaling' correlations. 

We see now that Wheeler's question 'why the quantum' has two aspects: why is 
the world quantum and not classical, and why is it quantum rather than superquantum, 
i.e., why the Tsirelson bound? In the following section, I discuss a remarkable answer 
to this question proposed by Pawlowski et al J6). 

2 Information Causality 

Pawlowski et al (SJ consider a condition they call 'information causality,' that the infor- 
mation gain for Bob about an unknown data set of Alice, given all his local resources 
and m classical bits communicated by Alice, is at most m bits0 They remark that 
the 'no-signaling' condition is just information causality for m = 0: if Alice commu- 
nicates nothing to Bob, then there is no information in the statistics of Bob's outputs 
about Alice's data set. Pawlowski et al show that the Tsirelson bound, \E\ < -^=, 
follows from this condition. 

To see how they arrive at this startling result, it is convenient to consider the fol- 
lowing game (related to oblivious transfer and communication complexity problems; 
see J9] |T0j[3]| and Section 4): At each round of the game, Alice receives N random and 
independent bits a = (ao, oi, • • • , a/v-i)- Bob, separated from Alice, receives a value 
of a random uniformly distributed variable b £ {0,2, . . . , N — 1}. Alice can send one 
classical bit to Bob with the help of which Bob is required to guess the value of the 6-th 
bit in Alice's list, a*, for some value of b 6 {0, . . . , N — 1}. We assume that Alice and 
Bob are allowed to communicate and plan a mutual strategy before the game starts, but 
once the game starts the only communication between them is the one classical bit that 
Alice is allowed to send to Bob at each round of the game. They win a round if Bob 
correctly guesses the 6-th bit for the round. They win the game if Bob always guesses 
correctly over any succession of rounds. Note that Alice must decide on the bit she 
sends to Bob at each round of the game independently of the value of b, which is given 
to Bob at each round and is unknown to Alice. 

Clearly, Bob will be able to correctly guess the value of one of Alice's bits, assum- 
ing they agree in advance about the index k of the bit Alice sends at each round, but 
Bob's guess will be at chance when the value of b ^ k. 

Now, suppose Alice and Bob are equipped with a supply of shared PR-boxes. 
Pawlowski et al show that there is a strategy that will allow Alice and Bob to win 

3 The restriction to the communication of classical bits is essential here. Recall that entanglement corre- 
lations can be exploited to allow Alice to send Bob two classical bits by communicating just one quantum 
bit. 
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the game, i.e., for any round, and for any b € {0, 2, . . . , N — 1}, Bob will be able to 
correctly guess the value of any designated bit in Alice's list a , Oi, ajv-i. 

Consider first the simplest case N — 2, where Alice receives two bits, ao, oi. The 
strategy in this case involves a single shared PR-box. Alice inputs ao © ai into her part 
of the box (i.e., a = a © ai) and obtains the output A. She sends the bit x = a © A 
to Bob. Bob inputs the value of b, i.e., or 1, into his part of the box and obtains the 
output B. He guesses aj = x © £? = ao © ^4 © £?. 

Now, the box functions in such a way that ^4 © £? = a • b — (a © ai) • 6. So Bob's 
guess is x © _B = a © A® 5 = a © ((a © ai) • 6). It follows that if b = 0, Bob 
correctly guesses ao, and if b = 1, Bob correctly guesses ao © ao © ai = ai. 

Suppose Alice receives four bits, a , Oi, a2, 03 (AT = 4). Bob's random variable 
labeling the bit he has to guess takes four values, b = 0, 1, 2, 3, and can be specified by 
two bits, b ,bi: 

b = b 2° + b 1 2 1 =b + 2b 1 

The strategy in this case involves an inverted pyramid of PR-boxes: two shared 
PR-boxes, L and R, at the first stage, and one shared PR-box at the final second stage. 
Alice inputs ao © ai into the L box, and a-i © 03 into the R box. Bob inputs 60 into both 
the L and R boxes and obtains the output B a (the input to one of these boxes will be 
irrelevant, depending on what bit Bob is required to guess; see below). At the second 
stage, Alice inputs (ao © Al) © (02 © Ar) into the shared PR-box, where Al is the 
Alice-output of the L box and Ar is the Alice-output of the R box, and obtains the 
output A. Bob inputs b\ into this box and obtains the output B\. Alice then sends Bob 
the bitx = a © A L © A. 

Now, Bob could correctly guess either ao © Al or a2 © Ar, using the elementary 
N = 1 strategy, as a; ©Si = a ®A L ®A®Bi. HereAffiBi = (a ®A L Q)a 2 ®A R )-bi. 
If 61 = 0, Bob would guess ao © Al. If b\ — 1, Bob would guess a2 © Ar. 

So if Bob is required to guess the value of ao (i.e., b = 0, 61 = 0) or a\ (i.e., 
6 = 1, bi = 0) — the input to the PR-box L — he guesses a © A L © A © B x © B , 
where B is the Bob-output of the L box. Then: 

a © A L © A © B 1 © B = a ®A L S)B a 

= a ffi(a ffiai)-6o (9) 

If 60 = 0, Bob correctly guesses ao; if bo = 1, Bob correctly guesses ai. 

If Bob is required to guess the value of a 2 (i.e., b = 0, 61 = 1) or a 3 (i.e., b = 
1, 61 = 1) — the input to the PR-box R — he guesses a © A L © A © B x © B , where 
B is the Bob-output of the R box. Then: 

a © A L © A © Bx © B = a 2 © A K © B 

= a 2 © (o 2 ffia 3 ) • b (10) 

If 60 = 0, Bob correctly guesses a 2 \ if bo = 1, Bob correctly guesses 03. 

These strategies are winning strategies for N ~ 2, and A = 4 (the game for N = 1 
is trivial). Clearly, the strategy for N = 4 is also a strategy for N = 3 (there is just 
one less value of b that Bob has to worry about). By adding more stages (levels) to the 
inverted pyramid, one obtains a strategy for N = 8 (four shared PR-boxes at the first 
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stage, two shared PR-boxes at the next stage, and one shared PR-box at the third and 
final stage), and so on. This is also a strategy for 4 < TV < 8, so there is a strategy for 
any N. 

The game can be modified to allow Alice to send m classical bits of information 
to Bob at each round, in which case Bob is required to guess the values of any set of 
m bits in Alice's list of N bits. In this case, Alice and Bob simply apply the above 
strategy for any N with m inverted pyramids of PR-boxes, one for each bit in the set 
of bits Bob is required to guess. 

We have seen that Alice and Bob can win this game if they share PR-boxes (E = 
1). What if they share non-signaling (NS) boxes with any 'no signaling' correlations 
corresponding to \E\ < 1, such as classical correlations(|_E| < i), or the correlations 
of entangled quantum states (\E\ < -^), or superquantum 'no signaling' correlations 

< E < 1)1 

The probability of simulating a PR-box with a NS-box is ^(1 + 25), where E de- 
pends on the NS-box (the nature of the correlations). Consider the N = 4 game where 
Alice and Bob share NS-boxes, and Alice is allowed to communicate one bit to Bob. 
Bob's guess x B\ Bo will be correct if B\ and Bq are both correct or both incorrect 
(since B\ © B will be the same in either case). 

The probability of being correct at both stages is: 

\(l + E)- l -{l + E) = \(l + Ef (11) 
The probability of being incorrect at both stages is: 

+ + E)) = 1(1 - 25) • 1(1 - 25) = 1(1 - Ef (12) 

So the probability P k that Bob guesses correctly, i.e., the probability that (3 = a k when 

b = k, is: 

Pk = \{l + Ef + \{1- Ef = 1(1 + E 2 ) (13) 

In the general case = 2™, Bob guesses correctly if he makes an even number of 
errors over the n stages (B , B\, B 2 , ■ ■ .) and the probability is: 

Pk = ^(1 + E) n + ^£ (1 - Efi(l + E)-^ = 1(1 + E") (14) 

where |_§ J denotes the integer value of ^. For example, if n = 3, the probability of 
being correct at each stage is: 

1(1 + 25). 1(1 + 25). 1(1 + 25) (15) 

and the probability of being incorrect at two out of the three stages (i.e., at B , B\ or 
B ,B 2 or Bi,B 2 is: 

Z-\(l-E)- l -(l-E)- l -{l + E) (16) 
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so the probability that Bob guesses correctly is : 

P k = 1(1 + Ef + |(1 - E) 2 (l +E) = hl + E 3 ) (17) 
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3 The Tsirelson bound 



In the game considered above, Alice has a list of N bits and Bob has to guess an 
arbitrarily selected one of these bits, b = k. If Bob knows the value of the bit he has 
to guess, Pfc = 1. The binary entropy of Pk is defined as h(Pk) — —Pk log Pk — (1 — 
Pk) log(l — Pk), so h(Pk) — 0. If Bob has no information about the bit he has to 
guess, Pk = 1/2, i.e., his guess is at chance, and h(Pk) = 1. 

If Alice sends Bob one classical bit of information, information causality requires 
that Bob's information about the N unknown bits increases by at most one bit. So if 
the bits in Alice's list are unbiased and independently distributed, Bob's information 
about an arbitrary bit b = k in the list cannot increase by more than 1/N bits, i.e., for 
Bob's guess about an arbitrary bit in Alice's list, the binary entropy h(Pk) is at most 
1/N closer to from the chance value 1, i.e., h(Pk) > 1 — 1/N. 

It follows that the condition for a violation of information causality in this case can 
be expressed as: 

h(P k ) <1-1/N (18) 
or, taking N = 2", the condition is: 

h(P k )<l-^ (19) 

Since P k = \{1 + E n ), we have a violation of information causality when: 

fc(i(l + 25»))<l-JL (20) 

Pawlowski et al [6| make use of the following inequality: 

h(\(i + y)) < 1-^2 (2D 

where In 2 w .693 is the natural log of 2 (base e). So information causality is violated 
if 

E^ n 1 
1 -2ln^ <1 -F (22) 

i.e., if 

(2E 2 ) n > 2 In 2 « 1.386 (23) 
If 2E 2 = 1, i.e., if E = Et = ^ (the Tsirelson bound), the inequality d23l is 
satisfied. This is a sufficient condition for a violation of information causality, but it is 
not necessary: even if (2E T ) n ^ 2 In 2, we could still have a violation of information 
causality for some n if h(h(l + Elf)) < 1 — See the Appendix for a proof that 
information causality is satisfied for E — Et, i.e., h(^(l + Ef)) > 1 — ^- for any n. 

If E > Et, i.e., if 2E 2 = 1 + a, for some a, no matter how small, there is a 
violation: (2E 2 ) n > 1 + na0but 1 +na > 2 ln2 « 1.386 for some n. That is, for 

4 Recall that (1 + a) n can be expanded as (1 + a) n = 1 + na + n ^~ 1) + " ( "~ 1 3 ) , (ra ~ 2) H 
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any a, however small, there is a value of n such that n > ^p, hence a value of n for 
which information causality is violated. 

To appreciate the significance of this result, consider some numbers for E and n. 
The condition for a violation of information causality is h(Pk) < 1 — Recall that 

Ino- r — logl ° x » lQ gio x 
iOg 2 X - logio2 ~ 3Q1 . 

Consider first the case where E = Et = -4= .707, theTsirelson bound. 

v2 

When ri = 1, Alice has 2 1 = 2 bits: 



!,„ 1 s lo Slo|(l 



HPk) = -(-(i + ^=) 



A) 



2 V .301 

+ 2 l V2 J -301 ; 

.600 (24) 



i _ i 

2^ - 2' 



There is no violation of information causality because .600 > 1 
When n = 10, Alice has 2 10 = 1024 bits: 

1 1 lo §10 K 1 + "Tfny) 

Mft) = -( 5 < 1 + ^> ST^ 

1 1 logio|(l- "inr) 

iaoT^* - 99939 (25) 

There is still no violation of information causality because .99939 > 1 — ^to = 1 — 
1 ps 9990 

1024 

Now consider the case where E > Et- Take E = .725 and n = 7. In this case, 
there is a violation of information causality: 

!/, , ™.7>gl0 3( 1 + - 7257 ) 



- -( 5 (l + .7250 3Q1 

1, 7x login M 1 - -725 7 ), 

w .99208 (26) 

There is a violation of information causality because .99208 < 1 — j^g ~ -99218. 
There is no violation for n = 6 because .9848 > 1 — ttt ~ .9844. 

64 

Note that the inequality ( 1211 ) has not been used in the above calculations. The only 
role of the inequality is to allow one to easily see that information causality is violated 
for some value of n if E > Et, i.e., if 2E 2 > 1 + a for any a. In fact, information 
causality could be violated for a lower value of n. In the case above, E = .725, 
a ~ .05125. Using the inequality, we find that information causality is violated when 
> ^386 - when n > 8. 

a — 

If E is very close to the Tsirelson bound, then n must be very large for a violation 
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of information causality. For n = 10 and E = .708: 



KP„) - - ( i,l + .708«) bS '»' (1 + - 708l0) 



-2' ' .301 

1, iru login 5(1 - -708 10 ), 
+ -(l-.708 10 ) 6102 \ '-)*. 99938 (27) 

There is no violation of information causality because .99938 > 1 — ps .9990. 
Using the inequality, with a = .708 — -^=, we find that n > 432 for a violation of 
information causality. 

Another way to look at this: If E = E T = i, P fe = |(1 + £? n ) -» | and 
h(Pk) — > 1 as ?i — > 00. So, if Alice has a very long list and sends Bob one bit of 
information, Bob's ability to correctly guess an arbitrary bit in Alice's list is essentially 
at chance if the correlations are bounded by the Tsirelson bound. For a PR-box, E = 1, 
Pk = 1, h(Pk) = 0, so Bob can correctly guess any arbitrary bit in Alice's list. 



4 Comments 

The analysis in Section 3 related information causality directly to a condition on the 
binary entropy. In Pawlowski et al [6|, the authors relate information causality directly 
to a condition on the mutual information between Alice and Bob, and only indirectly 
to the binary entropy: 

Ideally, we wish to define that information causality holds if, after 
transfer of the m-bit message, the mutual information between Alices data 
a and everything that Bob has — that is, the message x and his part B of 
the previously shared correlation — is bounded by m. Intuitively appealing 
though such a definition is, it has the severe issue that it is not theory- 
independent. Specifically, a mutual information expression 'I (a : x,B)' 
has to be defined for a state involving objects from the underlying theory 
(the possibilities include classical correlation, a shared quantum state and 
NS-boxes). It is far from clear whether mutual information can be defined 
consistently for all nonlocal correlations, nor whether such a definition 
would be unique. 

Pawlowski et al denote Bob's output by f3 and quantify the efficiency of Alice's and 
Bob's strategy by: 

N-l 

I=J2 I (ak-0\b = k) (28) 

fc=0 

where/(afe : /3\b = k) is the Shannon mutual information between ak and /3, computed 
under the condition that Bob is required to guess the bit b = k. They show that if the 
mutual information I(a : x, B) for any 'no signaling' theory satisfies three constraints 
(which are satisfed for quantum information and for classical information, a special 
case of quantum information): 
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• consistency with the classical Shannon mutual information when the Alice and 
Bob subsystems are both classical 

• the data-processing inequality: any local manipulation of data can only degrade 
information, i.e., acting on one subsystem locally by any admissible transforma- 
tion cannot increase the mutual information 

• the chain rule: I(A :B,C) = I (A : C) + 1 (A : B\C), where I(A : B\C) is the 
conditional mutual information 

then (i) information causality is satisfied, i.e., I(a : x, B) < m, and (ii) J (a : x,B) > 
I. 

Since I(a : x, B) > m if / > m, it follows that information causality is violated 

if: 

I > m (29) 

So if information causality is satisfied, then I <m, i.e., / < m is a necessary condition 
for information causality. (Note that we could, of course, have I < m but I(a : 
x, B) > m, so d29l is not a sufficient condition for information causality.) As the 
authors emphasize, / is fully specified by Alice's and Bob's input and output bits and 
is independent of the details of any particular physical theory. 

The Shannon mutual information I(X:Y) of two random variables is a measure of 
how much information they have in common: the sum of the information content of the 
two random variables, as measured by the Shannon entropy (in which joint information 
is counted twice), minus their joint information: 

I(X:Y) = H(X) + H(Y)- H(X,Y) 

= H(X)-H(X\Y) (30) 

where H{X) = — Y^iPi^ EPi ^ s tne Shannon entropy of the random variable X, 
H(X, Y) = — J2i j Pi,j logPiJ i s tne j° mt Shannon entropy of the two random vari- 
ables X, Y representing the joint information, and H(X\Y) is the conditional entropy: 
H(X\Y) = H(X,Y)-H(Y). Note that H(X\Y) < H{X), with equality if and only 
if X, Y are independent. 
So: 

N-l N-l 

i=J2 J ( afe ■■p\ b = k ) = Y. ( H ( afe ) + H o 9 ) - H ( afe - w (31) 

where the condition b = k has been omitted for ease of reading. 
First note that 

H(a k \(3) - H(a k ®P\P) 

< H(a k ®P) (32) 

The first equality follows because only the probabilities of the different alternatives are 
relevant in the calculation of the entropy. In this case, the probabilities are and 1 
and, given that (3 = 0, the probability that a k = is the same as the probability that 
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Ofe ffi P — 0, i.e., that = /3, and the probability that a k = 1 is the same as the 
probability that a k ffi /3 = 1, i.e., that afc 7^ /3; and similarly if j3 = 1. The second 
inequality follows because conditioning decreases entropy. 
Now: 

H(a k ®/3) = h{P k ) (33) 



so 



It follows that: 



H(a k \l3) < h(P k ) (34) 



I(a k :P)\b = k)>H(a k )-h(P k ) (35) 



In the case where the bits in Alice's list are unbiased and independently distributed, 

H(a k ) = 1, so: 

I(a k : j3)\b = fe) > 1 - h(P k ) (36) 



i.e., 

AT-l 



I>N-J2 h (Pk) (37) 

and since h(P k ) = |(1 + E n ), which is independent of k: 

I>N-Nh{P k ) (38) 

For a PR-box, E = 1, h(P k ) = 0, and / = AT. If Bob guesses randomly for all 
fc, then h(P k ) = 1, 1 = 0. So in the case where Alice sends m bits of information to 
Bob, < I < N, with a violation of information causality when / > m. 

If Alice sends Bob one bit of information, information causality is violated if / > 1, 
i.e., if: 

h(P k ) < 1 - ^ (39) 

or, taking N = 2", if: 

MA0<i-^r (40) 

which are, respectively, equations (|T8b and ( fT9l of Section 3. 

Pawlowski et al (6] p. 1101] express the condition of information causality as 
follows: 

Formulated as a principle, information causality states: 'the informa- 
tion gain that Bob can reach about a previously unknown to him data set of 
Alice, by using all his local resources and m classical bits communicated 
by Alice, is at most m bits.' The standard no- signalling condition is just 
information causality for m — 0. 

Stated in this way, the condition seems trivial: of course, if Alice sends Bob m 
bits of information, his information gain is at most m bits, and if m = his infor- 
mation gain is 0. But implicit in the condition is that Bob's local resources include 
the marginal probabilities of correlations between Alice and Bob and the values of the 
correlated variables, and similarly for Alice. The issue concerns the extent to which 
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Alice and Bob can exploit previously established correlations between them in such a 
way that the m bits of information communicated by Alice to Bob will allow Bob to 
correctly guess an arbitrarily designated set of bits in Alice's data set, which might con- 
tain N > m bits. Of course, without exploiting the correlations, Bob can know some 
specific, previously agreed upon set of m bits and, exploiting classical correlations, i.e., 
previously established shared randomness, Bob can know a different specific set of m 
bits on each occasion that Alice sends him m bits@ The relevant insight is that if the 
correlations are PR-box correlations, then Alice can send Bob a set of m bits chosen 
on the basis of the Alice-values of the correlated variables, where Alice and Bob select 
the variables appropriately as the inputs to the PR-boxes, in such a way that Bob can 
correctly guess any arbitrary set of m bits in Alice's data set. In other words, for the 
case m = 1, there is a way of exploiting the PR-box correlations so that the one bit of 
information can be associated with any designated bit in Alice's data set of N bits, for 
any N (this was pointed out already in [ 1 1 1). 

So in the case where the bits in Alice's data set are unbiased and independently 
distributed and Alice sends Bob one bit of information, the PR-box correlations can 
be exploited to achieve Pk — 1 for all fc, i.e., h(Pk) — for all k. The intuition 
behind information causality is that this is 'too good to be true,' in fact, that the binary 
entropy should be bounded: h(Pk) > 1 — jj. Putting it differently, when the bits in 
Alice's data set are unbiased and independently distributed, the intuition is that if the 
correlations can be exploited to distribute one bit of communicated information among 
the N unknown bits in Alice's data set, the amount of information distributed should 
be no more than bits, because there can be no information about the bits in Alice's 
data set in the previously established correlations themselves. 

As Pawlowski et al show, for 'no signaling' correlations, Pk — \(1 + E n ), where 
N = 2". For classical correlations, E = |, h(Pk) « .811 for n = 1. For quantum 
correlations, E = Et = h(Pk) ~ -600 for n = 1, so Alice and Bob can do 
better exploiting quantum correlations than they can if they are restricted to classical 
correlations. This is the case for any n, but information causality is always satisfied. 
The intriguing result by Pawlowski et al is that information causality is violated for 
some value ofn if E > Et- From this perspective, it is misleading to claim that the 'no 
signaling' condition is 'just information causality for m = 0.' If Alice communicates 
no information to Bob, they have no possibility of exploiting correlations to increase 
Bob's access to Alice's data set. The condition of information causality concerns the 
extent to which correlations can be exploited to increase Bob's access to Alice's data 
set, in the sense of improving Bob's ability to correctly guess any arbitrary bit in Alice's 
data set. 

In fact, the term 'information causality' is suggestive in the wrong sense. The 
principle really has nothing to do with causality and is better understood as a constraint 
on the ability of correlations to enhance the information content of communication 
in a distributed task. A more appropriate term would be 'informational neutrality of 
correlations,' and the principle should be formulated as follows: 

Correlations are informationally neutral: insofar as they can be ex- 

5 A suitably long shared list of random bits can be used by Alice and Bob to pick a different set of m bits 
at each round of the guessing game, for some finite set of rounds. 
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ploited to allow Bob to distribute information communicated by Alice 
among the bits in an unknown data set held by Alice in such a way as 
to increase Bob's ability to correctly guess an arbitrary bit in the data set, 
they cannot increase Bob's information about the data set by more than the 
number of bits communicated by Alice to Bob. 

So if Alice has a data set of N uniformly and independently distributed bits and sends 
Bob one bit of information, and Bob can exploit previously established correlations to 
increase his ability to correctly guess an arbitrary bit in the data set, his information 
gain about an arbitrary bit in the data set can be no more than 1/N bits, i.e., the binary 
entropy of the probability of a correct guess cannot be less than 1 — 1 /N. 

The correlations of a PR-box are not informationally neutral in this sense. While 
they are logically admissible, they are 'too good to be true' in the way they allow the 
solution of the following two distributed tasks: 

• The 'dating game': Alice and Bob would like to go on a date, but only if they 
know that they both like each other. In other words, they would like to compute 
a function that takes the value 1 if they both like each other (i.e., if both inputs 
to the function are 1), but takes the value if at least one party does not like the 
other (i.e., if the inputs are both 0, or one input is and the other input is 1). Now, 
in the real world, there is no way they can do this without revealing information 
that they both want to keep private: Alice does not want Bob to know that she 
likes him if he does not like her, and similarly for Bob. With a PR-box, they can 
compute this function, while keeping private the information they want to keep 
private. Alice and Bob input or 1 into their inputs to the PR-box when they 
are separate (so neither party sees the other's input). They then come together 
and share the outputs. If the outputs are different, they know that both inputs 
were 1, so they happily go on a date. In this case, of course, Alice knows that 
Bob likes her, and Bob knows that Alice likes him, but that's fine. If the outputs 
are the same, they know only that either Alice did not like Bob, or that Bob did 
not like Alice, or that the dislike was mutual. While Alice can infer that Bob 
does not like her if she likes him, this knowledge is private, so Alice avoids any 
humiliation; and similarly for Bob. 

• 'One-out-of-two' oblivious transfer: Alice has a data set consisting of two bits 
of information. The constraint on Alice is that she can send Bob one bit of in- 
formation. The requirement for Bob is that he uses the one bit of communicated 
information to correctly guess whichever bit he chooses in Alice's data set, in 
such a way that Alice is oblivious of his choice. Again, there is no way to do 
this in the real world, but if Alice and Bob have access to a PR-box they can 
successfully achieve this task. The protocol is the same as the protocol for the 
N = 2 case discussed in Section 2. 

The remarkable result of Pawlowski et al shows that, while quantum correlations 
are 'more like' PR-box correlations than classical correlations, insofar as they increase 
the ability of Alice and Bob to perform distributed tasks relative to classical correla- 
tions, they represent the limit of what is possible if correlations are 'informationally 
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neutral,' in the sense that correlations established prior to the choice of a data set can 
contain no information about such a data set, and hence should not be able to be ex- 
ploited to allow a party who has no access to the data set to correctly guess any arbitrary 
bit in the set. This considerably extends related results by van Dam [9, 10], Brassard 
et al 0, Linden et al (5). Note that there are other results in which nonlocal boxes are 
exploited to derive the Tsirelson bound. See Skrzypczyk et al (8), in which a dynam- 
ics is defined for PR-boxes and the Tsirelson bound is derived from a condition called 
'nonlocality swapping.' 

Pawlowski et al |6, p. 1 103-1 104] conclude with the following remarks: 

In conclusion, we have identified the principle of Information Causal- 
ity, which precisely distinguishes physically realized correlations from 
nonphysical ones (in the sense that quantum mechanics cannot reach them). 
It is phrased in operational terms and in a theory-independent way and 
therefore we suggest it is at the same foundational level as the no-signaling 
condition itself, of which it is a generalization. 

The new principle is respected by all correlations accessible with quan- 
tum physics while it excludes all no-signaling correlations, which violate 
the quantum Tsirelson bound. Among the correlations that do not vio- 
late that bound it is not known whether Information Causality singles out 
exactly those allowed by quantum physics. If it does, the new principle 
would acquire even stronger status. 

Classical correlations bounded by E < | can be associated with a polytope, where 
the vertices represent 'no signaling' deterministic states. For example, in the case con- 
sidered above for a bipartite system with two binary-valued quantities, the deterministic 
state in which the values of the two quantities are both zero, for all four possible com- 
binations, is given by Table 2. There are 16 'no signaling' deterministic states (each of 



a 

b 





1 





p(00|00) = 


1 


p(10|00) 


= 


p(00|10) 


= 1 


p(10|10) 


= 




p(01|Q0) = 





p(ll|00) 


= 


p(01|10) 


= 


p(ll|10) 


= 


1 


p(00]01) = 


1 


p(10|01) 


= 


p(00|ll) 


= 1 


p(10|ll) 


= 




p(01|01) = 





p(ll|01) 


= 


p(01|ll) 


= 


p(lllll) 


= 



Table 2: A deterministic state 

which can be represented as a product of local states, an Alice deterministic state and 
a Bob deterministic state) out of 256 possible deterministic states — the remaining 240 
deterministic states allow signaling. The 16-vertex classical polytope is included in a 
24-vertex 'no signaling' nonlocal polytope, where the vertices are the 16 'no signaling' 
deterministic states and 8 additional PR-box states, represented by the probabilities in 
Table 1 , or probabilities obtained from Table 1 by by relabeling the a-inputs, and the A- 
outputs conditionally on the a-inputs, and the 6-inputs, and the S-outputs conditionally 
on the 6-inputs. Quantum correlations bounded by E = Et < -t^ are associated with 
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a spherical convex set with extremal points between the 16-vertex classical simplex 
and the 24-vertex 'no signaling' nonlocal polytope. 

The open question is whether non-quantum correlations represented by points out- 
side the quantum convex set but below the Tsirelson bound can also be excluded by 
information causality. For a discussion, see Allcock et al Q~). 



5 Appendix 



In (6 1, the authors prove quite generally that information causality is satisfied for any 
'no signaling' theory satisfying three constraints on mutual information (consistency 
with the classical Shannon mutual information, the data-processing inequality, and the 
chain rule), hence for quantum information, which satisfies the constraints. It follows 
that information causality is satisfied at and below the the Tsirelson bound. 

The following is a simple direct proof (see Section 3) that if E = Et — ^g, then: 



h(\(l+E n ) > 1-i- 
V 2 V ; ~ 2" 



(41) 



i.e., 



1 (1 + £") log(i(l + E n )) hi E n ) log(i(l - E n )) > 1 - h 



2 v , ov 2 v „ 2 

After a little algebra, this can be expressed as: 
log(l - E 2n ) + E" log 

Note that the logarithms are to the base 2. 
Now, if —1 < x < 1: 



1 + E r ' 
1 - E n 



< 



(42) 



(43) 
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Substituting E = Et = this becomes: 
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(44) 
(45) 



m(2m — 1) 
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(46) 



(47) 



Since log 2 x = log 2 e ■ log e x, it follows that log(l — E ) + E n log , where 



the logarithms are to the base 2, can be expressed as the following infinite series: 



1 111 1 
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m(2m-l) 2 r 



(48) 
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so the inequality d43l we are required to prove becomes: 

11111 1 1 1 

F + 6'22^ + l5'2^"' + m(2m- 1) ' 2^ + " ' " ° Se ' 2^ ( } 



11111 1 1 



2 ' 6 2»+! ' 15 2 2 "+! ' m(2m - 1) 2(™- 1 )"+ 1 + " ' - lo & 2 ~ -693147 

(50) 

This is clearly the case. The largest value of the series is obtained for n = 1, when the 
first term is .5. The remaining terms affect only the second and later decimal places. 
Alternatively, from (l44l we have: 

108.2=1-1 + 1-1 + ... (51) 



so, subtracting the series on the left hand side of the inequality ( I50b from the series for 
log e 2, what has to be proved is that, for any n: 

11111 111 111 1 

This is obvious by inspection, since each negative term in parenthesis is smaller than 
its postive predecessor, for any n. 
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